Goto

Collaborating Authors

 neural network approximation


Statistical Convergence of Spherical First Hitting Diffusion Models

arXiv.org Machine Learning

Denoising diffusion models have evolved into a state-of-the-art method for tasks in various fields, such as denoising and generation of images, text generation, or generation of synthetic data for training of other machine learning models. First hitting diffusion models (FHDM) are a particular class of denoising diffusion models with \textit{random} adaptive generation time tailored to generate data on a known manifold. Building on the conditioning framework of Doob's $h$-transform these models leverage the given information on the target data manifold to demonstrate strong performance across tasks while offering distinct features such as time-homogeneous dynamics of the generating process and a reduced average simulation time. Even though the theoretical investigation of standard forward-backward diffusion models has attracted much attention in the recent past, the statistical convergence properties of FHDMs are not yet understood. In this work, we show that, up to logarithmic factors, FHDMs achieve the minimax optimal convergence rate in total variation for spherically supported Sobolev smooth data distributions. In particular, this is the first statistical optimality result for denoising diffusion modelling with random generation time.



Finite-Time Analysis of Adaptive Temporal Difference Learning with Deep Neural Networks

Neural Information Processing Systems

Temporal difference (TD) learning with function approximations (linear functions or neural networks) has achieved remarkable empirical success, giving impetus to the development of finite-time analysis. As an accelerated version of TD, the adaptive TD has been proposed and proved to enjoy finite-time convergence under the linear function approximation. Existing numerical results have demonstrated the superiority of adaptive algorithms to vanilla ones. Nevertheless, the performance guarantee of adaptive TD with neural network approximation remains widely unknown. This paper establishes the finite-time analysis for the adaptive TD with multi-layer ReLU network approximation whose samples are generated from a Markov decision process. Our established theory shows that if the width of the deep neural network is large enough, the adaptive TD using neural network approximation can find the (optimal) value function with high probabilities under the same iteration complexity as TD in general cases. Furthermore, we show that the adaptive TD using neural network approximation, with the same width and searching area, can achieve theoretical acceleration when the stochastic semi-gradients decay fast.


Sharp Lower Bounds for Linearized ReLU^k Approximation on the Sphere

arXiv.org Artificial Intelligence

We prove a saturation theorem for linearized shallow ReLU$^k$ neural networks on the unit sphere $\mathbb S^d$. For any antipodally quasi-uniform set of centers, if the target function has smoothness $r>\tfrac{d+2k+1}{2}$, then the best $\mathcal{L}^2(\mathbb S^d)$ approximation cannot converge faster than order $n^{-\frac{d+2k+1}{2d}}$. This lower bound matches existing upper bounds, thereby establishing the exact saturation order $\tfrac{d+2k+1}{2d}$ for such networks. Our results place linearized neural-network approximation firmly within the classical saturation framework and show that, although ReLU$^k$ networks outperform finite elements under equal degrees $k$, this advantage is intrinsically limited.



Provable wavelet-based neural approximation

arXiv.org Machine Learning

Provable wavelet-based neural approximation Youngmi Hur Hyojae Lim Mikyoung Lim April 24, 2025 Abstract In this paper, we develop a wavelet-based theoretical framework for analyzing the universal approximation capabilities of neural networks over a wide range of activation functions. Leveraging wavelet frame theory on the spaces of homogeneous type, we derive sufficient conditions on activation functions to ensure that the associated neural network approximates any functions in the given space, along with an error estimate. These sufficient conditions accommodate a variety of smooth activation functions, including those that exhibit oscillatory behavior. Furthermore, by considering the L 2 -distance between smooth and non-smooth activation functions, we establish a generalized approximation result that is applicable to non-smooth activations, with the error explicitly controlled by this distance. This provides increased flexibility in the design of network architectures. 1 Introduction Neural networks have long been recognized for their remarkable ability to approximate a wide range of functions, enabling state-of-the-art achievements across various fields in machine learning and artificial intelligence, image processing, natural language processing, and scientific computing (see, for example, [13, 19] and references therein). Various activation functions, such as ReLU, Sigmoid, Tanh, and oscillatory functions, have also been explored to further enhance network performance and adaptability. The versatility of neural networks originates from the structural flexibility of architectures that combine affine transformations with nonlinear activation functions. In addition, classical universal approximation theorems [5, 12, 16] provide a theoretical basis for this flexibility by guaranteeing that, under suitable conditions, neural networks can approximate any continuous function on a bounded domain, underscoring their representational power. These seminal results have been extended along various directions, including radial basis function (RBF) networks [22, 25], non-polynomial activations [20], approximation of functions and their derivatives [15, 21], the influence of network depth [9], approximation error bounds [1], convolutional neural networks (CNN) [32], recurrent neural networks (RNN) [27]. As neural network architectures continue to evolve and diversify in practice, their theoretical foundations-beyond those provided by classical approximation theorems-have attracted Department of Mathematics, Yonsei University, Seoul 03722, Republic of Korea (yhur@yonsei.ac.kr)


Finite-Time Analysis of Adaptive Temporal Difference Learning with Deep Neural Networks

Neural Information Processing Systems

Temporal difference (TD) learning with function approximations (linear functions or neural networks) has achieved remarkable empirical success, giving impetus to the development of finite-time analysis. As an accelerated version of TD, the adaptive TD has been proposed and proved to enjoy finite-time convergence under the linear function approximation. Existing numerical results have demonstrated the superiority of adaptive algorithms to vanilla ones. Nevertheless, the performance guarantee of adaptive TD with neural network approximation remains widely unknown. This paper establishes the finite-time analysis for the adaptive TD with multi-layer ReLU network approximation whose samples are generated from a Markov decision process.


Mathematical theory of deep learning

arXiv.org Artificial Intelligence

It is designed to help students and researchers to quickly familiarize themselves with the area and to provide a foundation for the development of university courses on the mathematics of deep learning. Our main goal in the composition of this book was to present various rigorous, but easy to grasp, results that help to build an understanding of fundamental mathematical concepts in deep learning. To achieve this, we prioritize simplicity over generality. As a mathematical introduction to deep learning, this book does not aim to give an exhaustive survey of the entire (and rapidly growing) field, and some important research directions are missing. In particular, we have favored mathematical results over empirical research, even though an accurate account of the theory of deep learning requires both.


Structure-preserving neural networks for the regularized entropy-based closure of the Boltzmann moment system

arXiv.org Artificial Intelligence

The main challenge of large-scale numerical simulation of radiation transport is the high memory and computation time requirements of discretization methods for kinetic equations. In this work, we derive and investigate a neural network-based approximation to the entropy closure method to accurately compute the solution of the multi-dimensional moment system with a low memory footprint and competitive computational time. We extend methods developed for the standard entropy-based closure to the context of regularized entropy-based closures. The main idea is to interpret structure-preserving neural network approximations of the regularized entropy closure as a two-stage approximation to the original entropy closure. We conduct a numerical analysis of this approximation and investigate optimal parameter choices. Our numerical experiments demonstrate that the method has a much lower memory footprint than traditional methods with competitive computation times and simulation accuracy.


Enhanced physics-informed neural networks with domain scaling and residual correction methods for multi-frequency elliptic problems

arXiv.org Artificial Intelligence

A physics-informed neural network (PINN) combines the constraint-satisfaction ability of partial differential equations (PDEs) with the representation power of deep neural networks to learn solutions of PDEs. PINNs were first introduced in [3, 7, 11] as a way of solving problems in mathematical physics and engineering that can be modeled as PDEs. The idea behind PINNs is to treat the solution of a PDE as an unknown function that can be represented by a neural network. The neural network is then trained end-to-end to satisfy the boundary conditions and PDE constraints. This enables PINNs to deal with problems that are challenging to solve using conventional numerical techniques, such as, those with high-dimensional input spaces and complex boundary conditions. Due to the growing need for effective solutions to challenging physical problems in fields like fluid dynamics, structural mechanics, and heat transfer, PINNs have become increasingly popular in recent years. Computational and theoretical studies on PINNs have also shown to be useful for problems in machine learning, computer vision, and other fields outside physics and engineering due to their flexibility and representational power. PINNs have been applied to a variety of problems in physics, engineering, and other fields, including solving PDEs, modeling physical systems, and carrying out data-driven simulations. However, there are still some obstacles that arise when applying them to the field of computational science and engineering.